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ABSTRACT 

In this paper we investigate the decoding of parallel turbo 
codes over the binary erasure channel suited for upper-layer 
error correction. The proposed algorithm performs "on-the- 
fly" decoding, i.e. it starts decoding as soon as the first sym- 
bols are received. This algorithm compares with the itera- 
tive decoding of codes defined on graphs, in that it propa- 
gates in the trellises of the turbo code by removing transi- 
tions in the same way edges are removed in a bipartite graph 
under message-passing decoding. Performance comparison 
with LDPC codes for different coding rates is shown. 



1. INTRODUCTION 

The binary erasure channel (BEC) introduced by Elias (T) is 
one of the simplest channel models: a symbol is either erased 
with probability p, or exactly received with probability 1 — p. 
The capacity of such a channel with a uniform source is given 
by: 

C=l-p 

Codes that achieve this capacity are called Maximum-Distance 
Separable (MDS) codes, and they can recover the K infor- 
mation symbols from any K of the N codeword symbols. 
An MDS code that is widely used over the BEC is the non- 
binary Reed-Solomon (RS) code, but its block length is lim- 
ited by the Galois field cardinality that dramatically increases 
the decoding complexity. For large block lengths, low-density 
parity-check (LDPC) codes [2| |3| |4| |5| and repeat-accumu- 
late (RA) [6 1 codes with message-passing decoding proved to 
perform very close to the channel capacity with reasonable 
complexity. Moreover, "rateless" codes Q (H that are capa- 
ble of generating an infinite sequence of parity symbols were 
proposed for the BEC. Their main strength is their high per- 
formance together with linear time encoding and decoding. 
However, convolutional-based codes, that are widely used for 
Gaussian channels, are less investigated for the BEC. Among 
the few papers that treat convolutional and turbo codes in 
this context are ifTol ifTTIl lfl2l fBl fl4l. 

In practical systems, data packets received at the upper 
layers encounter erasures. In the Internet for instance, it is 
frequent to have datagrams that are discarded by the phys- 
ical layer cyclic redundancy check (CRC) or forward error 



correction (FEC), or even by the transport level user data- 
gram protocol (UDP) checksums. Another example would 
be the transmission links that exhibit deep fading of the sig- 
nal (fades of lOdB or more) for short periods. This is the 
case of the satellite channel where weather conditions (es- 
pecially rain) severely degrades the channel quality, or even 
the mobile transmissions due to terrain effect. In such situa- 
tions, the physical layer FEC fails and we can either ask for 
re-transmission (only if a return channel exists, and penaliz- 
ing in broadcast/ multicast scenarios) or use upper layer (UL) 
FEC. 

In this paper, we propose a minimum-delay decoding al- 
gorithm for turbo codes suited for UL-FEC, in the sense that 
the decoding starts since the reception of the first symbols 
where a symbol could be a bit or a packet. The paper is orga- 
nized as follows: Section|2]gives the system model and a brief 
recall of the existing decoding algorithms. Section[3]explains 
the minimum-delay decoding algorithm. Simulation results 
and comparisons with LDPC codes are shown in Section HI 
and Section[5]gives the concluding remarks. 

2. SYSTEM MODEL AND NOTATIONS 

We consider the transmission of a parallel turbo code (9J with 
rate R c = K/N over the BEC. An information bit sequence 
of length K is fed to a recursive systematic convolutional 
(RSC) code with rate p = k/n to generate a first parity bit 
sequence. The same information sequence is scrambled via 
an interleaver II to generate a second parity sequence. With 
half-rate RSC constituents, the resulting turbo code has rate 
1/3. In order to raise the rate of the turbo code, parity bits 
are punctured. In this paper, we consider rate- 1/3, punctured 
rate-1/2 and punctured rate-2/3 turbo codes. The decoding 
of turbo codes is performed iteratively using probabilities on 
information bits, which requires the reception of the entire 
codeword before the decoding process starts. For instance, 
the soft-input soft-output (SISO) "Forward-Backward" (FB) 
algorithm [15|, optimal in terms of a posteriori probability 
(APP) on symbols, consists of one forward recursion and one 
backward recursion over the trellis of the two constituent codes. 
As turbo codes are classically used over Gaussian channels, 
a SISO algorithm (the FB or other sub-optimal decoding al- 
gorithms) are required to attain low error rates. Exchanging 



hard information between the constituent codes using an al- 
gorithm such as the well-known Viterbi Algorithm (VA) lfl6l 
(that is a Maximum-Likelihood Sequence Estimator (MLSE) 
for convolutional codes) is harshly penalizing. However, in 
the case of the BEC, a SISO decoding algorithm is not neces- 
sary. In fact, it has been shown in lfl2l that the VA is optimal 
in terms of symbol (or bit) probability on the BEC, which 
means that one can achieve optimal decoding of turbo codes 
on the BEC without using soft information. In other words, 
if a bit is known to (or correctly decoded by) one trellis, its 
value cannot be modified by the other trellis. Motivated by 
this key property, we propose a decoding algorithm for turbo 
codes based on hard information exchange. 



b 1 b 2 




Fig. 1. Transitions of the half -rate four-state RSC(7, 5)g code. 



3. ON-THE-FLY DECODING OF TURBO CODES 

The turbo code has two trellises that have K steps each, and 
one codeword represents a path in the trellises. In a goal to 
minimize the decoding delay, we propose an algorithm that 
starts decoding directly after the reception of the first bits of 
the transmitted codeword. First, at every step of the trellises, 
if one of the n bits of the binary labeling is received (i.e. is 
known), we remove the transitions that do not cover this bit. 
If, at some step in the trellis, all the transitions leaving a state 
on the left are removed, we then know that no transition 
arrives to this state at the previous step. Consequently, all 
the incoming transitions to state from the left are removed. 
Similarly, if - at some step - there are no transitions arriving 
to a state ej on the right, this means that we cannot leave state 
e.j at the following step, and all the transitions outgoing from 
state ej are removed. This way the information propagates 
in the trellis and some bits can be determined without being 
received. This algorithm is inspired by the message-passing 
decoding of LDPC codes over the BEC, where transitions 
connected to a variable node are removed if this variable is 
received. 

Now at some stage of the decoding process, if an infor- 
mation bit is determined in one trellis without being received, 
we set its interleaved (or de-interleaved) counterpart as known 
and the same propagation is triggered in the other trellis. The 
information exchange between the two trellises continues un- 
til propagation stops in both trellises. This way we can re- 
cover the whole transmitted information bits without receiv- 
ing the whole transmitted codeword. 

In the sequel, for the sake of clearness, we will only con- 
sider parallel turbo codes built from the concatenation of two 
RSC codes with generator polynomials (7, 5) in octal (the 
polynomial (7) 8 being the feedback polynomial), constraint 
length L = 3, and coding rate p = k/n = 1/2, code that 
has a simple trellis structure with four states. The algorithm 
can be applied to any parallel turbo code built from other RSC 
constituents. The transitions of the RSC (7, 5)s code between 
two trellis steps are shown in Fig. Q] As the code is system- 
atic, the bit bi represents the information bit, and the bit 62 



the parity bit. There are 2 k = 2 transitions leaving and 2 
transitions arriving to each state. The transitions between two 
steps of the trellis can be represented by a 2 L_1 x 2 i_1 matrix 
(4x4 matrix in this case). For the (7, 5)s code for instance, 
the transition table is given by: 
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where an X means that the transition does not exist. For the 
need of the proposed algorithm, we will use the transition ta- 
ble of the code to build binary transition matrices T xx , Tb lX , 
and Xxb 2 with bi, 62 G {0, 1} that contain the allowed tran- 
sitions depending on the known bits. These matrices will be 
stored at the decoder and used as look-up tables throughout 
the decoding process. For instance, if the two bits of the tran- 
sition are unknown, we define the matrix: 

"10 10" 

rp 10 10 

xx ~ 10 1 
10 1 

where a one in position means that there is a transition 
between state and state e j , and a zero means that no tran- 
sition exists. However, if b\ =0 and 62 is unknown, or if 
b\ is unknown and 62 = 0, we define the following matrices 
corresponding to the allowed transitions: 

"1000I [1000" 

0010 0010 
0x ~ 1 ' x0 ~ 10 
1 J L 1 

We build the other matrices similarly. Note that there are a 
total of 3" matrices, each of size 2 i_1 x 2 L ~ X . 



On-the-fly decoding algorithm 

1) Initialization step. We consider matrices M\ (i) and M 2 (j) 
corresponding to transitions at steps i and j of the two trel- 
lises of the constituent codes. These matrices are initialized 
as follows: 
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M 1 {t) = M 2 {t)=T^ 



,K-1 



The matrices at steps and 1 (namely Mi(0), 1/2(0), 
Mi (1) and, Ma(l)) represent the fact that any codeword starts 
in the zero state. The matrices at steps K and K + 1 represent 
the two steps required for trellis termination (i.e. ending in 
the zero state). 

2) Reception step. Each time a bit r £ {0, 1} is received: 

• If r is an information bit, it is placed in appropriate 
positions in both trellises as r 2 i = r 2 j — r, where j 

1 1 (j). We then compute: 

Mi(i) - Mi(i) A T rx and M 2 (j) = M 2 (j) A T rx 

where the A operator is a logical AND between corre- 
sponding entries of the two matrices. In other words, 
we only keep the transitions in Mi (i) with bi = r. 

• If r is a parity bit, we set r 2 ;+i = r if r belongs to the 
first trellis, or r 2 j+i = r if r belongs to the second one. 
We then compute: 

Mi(t) = Mi(t) A T XJ . or M 2 (j) = M 2 {j) A T xr 

3) Propagation step. If either M\(i) or M 2 (j) has at least 
one all-zero column or one all-zero row, the algorithm is able 
to propagate in either direction in either trellis using the fol- 
lowing rule: 

• Let d £ {1, 2} represent the trellis indices and initialize 
a counter t £ {i, j} representing the step index through 
each trellis. 

• Left propagation: an all-zero row with index u in Md(t) 
generates an all-zero column with index u in Md(t— 1). 

• Right propagation: an all-zero column with index v in 
Md(t) generates an all-zero row with index v in Md(t+ 
I). 



If we get new all-zero columns or new all-zero rows at steps 
t ± 1, we set t <— t ± 1 and continue the propagation (Step 3). 

4) Duplication step. If during the propagation we get some 
Mdit) Q Tbx (i-e- the value of the information bit of the 
t th transition in the d lh trellis is equal to b), we proceed as 
follows: 

• If Mi(t) C T &x , we compute: 

M 2 (n(t)) = M 2 (n(i)) A I& x 

and then we propagate from II(i) in the second trellis 
(Step 3). 

• If M 2 (t) C T bx , we compute: 

Mx(TT l {t)) =M 1 (n- 1 (<)) AT bx 

and then we propagate from IT - 1 (t) in the first trellis 
(Step 3). 

5) New reception step. If the propagation in both trellises 
stops, we go back to step 2. 

6) Decoding stop. The decoding is successful if Mi (i) C 
Tbx for slli E {0, . ■ ■ , K— 1}. We then define the inefficiency 
ratio /i as follows: 

^*stop 



K 



where r stop > K is the number of bits received at the mo- 
ment when the decoding stops. An illustration of the pro- 
posed algorithm is shown in Fig. [2] First, at the reception of 
an information bit 61 = 0, we remove the transitions in the 
corresponding step in the trellis where b\ = 1. Note that this 
step is done in interleaved positions in both trellises at the re- 
ception of an information bit. At this stage, no propagation 
in the trellis is possible as all the states are still connected. 
Next we receive a parity bit b 2 — 1; the remaining transitions 
corresponding to b 2 = are removed. At that point, we no- 
tice that state e\ and e 2 on the left are not connected. This 
means that the transitions arriving from the left to these states 
are not allowed anymore, thus they are removed. Similarly, 
we remove the transitions leaving the states e\ and e% on the 
right. 

In fact, the average decoding inefficiency fi av of the code 
relates to its erasure recovery capacity as follows: suppose 
that, on average, the proposed decoding algorithm requires 
K' (K' > K) symbols to be able to recover the K informa- 
tion symbols, we can write the following: 

K 1 _ (l-pth)JV _ 1-pth 



K 



where the threshold probability pth corresponds to the aver- 
age fraction of erasures the decoder can recover. We can then 
write pth as: 

Pth = 1 — HavRc 
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Fig. 2. On-the-fly decoding; removed edges are dashed: (a) 
Trellis after the reception of the source bit 61 = 0, (b) Trellis 
after the reception of the parity bit 62 = 1, (c) Trellis after 
left and right propagation. 

For instance, if a code with R c = 1/3 has fi av = 1 . 076, it has 
Pth — 0.641. As a code with this coding rate is -theoretically- 
capable of correcting a probability of erasure of p = 2 /3, the 
gap to capacity is: 

A p =p-p th ~ 0.025 



4. SIMULATION RESULTS 

In this section, the performance of the proposed algorithm 
with parallel turbo codes is shown. The coding rate of the 
turbo code using half-rate constituent codes is R c = 1/3. 
However, we also consider turbo codes with R c = 1/2 and 
R c = 2/3 obtained by puncturing the R c = 1/3 turbo code. 
We use two types of interleavers: 1) Pseudo-random (PR) in- 
terleaves (not optimized) and 2) Quasi-cyclic (QC) bi-dimen- 
sional interleavers ifTTl that are the best known interleavers in 
the literature: in fact, it was shown in lfl8ll that the minimum 
distance d m i n of a turbo code is upper-bounded by a quantity 
that grows logarithmically with the interleaver size K, and the 
QC interleavers always achieve this bound. 

The comparison is made with regular and irregular stair- 
case LDPC codes. An LDPC code is said to be staircase if 
the right hand side of its parity check matrix consists of a 
double diagonal. The advantage of a staircase LDPC code is 
that the encoding can be performed in linear time using the 
parity check matrix, therefore there is no need for the gen- 
erator matrix, which generally is not low density. A stair- 
case LDPC code is said to be regular if the left hand side of 
the parity check matrix is regular, i.e. the number of l's per 
column is constant. Otherwise it is said to be irregular. In 
this section, we consider regular staircase LDPC codes with 
four l's per each left hand side column. Irregular staircase 
LDPC codes are optimized for the BEC channel by density 
evolution. In Fig. [3] we compare the performance of turbo 
codes and LDPC codes for R c = 1 /3. Turbo codes with RSC 
(7, 5)8 and PR interleaving achieve an average inefficiency 
/lav of about 1.09, which means they require K' — 1.097^ 
received bits (or 9% overhead) to be able to recover the K 
information bits. However, using a QC interleaver, the over- 
head with the same turbo code is of about 7.6%, which is very 



With codes such as LDPC or turbo codes, it is possible to 
achieve near-capacity performance with iterative decoding, 
with fi av ~ 1. Ideally, an MDS code (that achieves capac- 
ity) has fi av — 1, i.e. it is capable of recovering the K infor- 
mation symbols from any K received symbols out of the N 
codeword symbols. 

Finally, it is important to note that the algorithm proposed 
in this section is linear in the interleaver size K. In fact, an 
RSC code with 2 i_1 states and 2 k transitions leaving each 
states has 2 L_1 x 2 fe = 2 k+L ~ 1 transitions between two trel- 
lis steps. This means that the turbo code has a total of approx- 
imately 2 x K x 2 k+L ~ 1 transitions. Even if the decoding is 
exponential in k and L, it is linear in K. As we can obtain 
very powerful turbo codes with relatively small k and L, we 
can say that a turbo code with the proposed algorithm has lin- 
ear time encoding and decoding, and thus it is suited for appli- 
cations were low-complexity "on-the-fly" encoding/decoding 
are required (as with the "Raptor codes" [8 1 for instance). 
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Fig. 3. Average inefficiency (fj, av ) with respect to interleaver 
size K over the BEC. Turbo code with half-rate RSC con- 
stituents versus LDPC codes, R c = 1/3. 
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Fig. 4. Average inefficiency (fi av ) with respect to interleaver 
size K over the BEC. Turbo code with half-rate (7,5)s RSC 
constituents versus LDPC codes, punctured to R c = 1/2. 
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Fig. 5. Average inefficiency (fJ, av ) with respect to interleaver 
size K over the BEC. Turbo code with half-rate (7,5)g RSC 
constituents versus LDPC codes, punctured to R c = 2/3. 



close to the irregular staircase LDPC code, while the regular 
staircase LDPC is far above regular turbo codes (16% over- 
head). In addition, it is important to note that using turbo 
codes with RSC constituents with L = 4 (trellis with eight 
states), namely the RSC (13, 15)g and the (17, 15)s codes, 
increases the decoding complexity without improving the per- 
formance. Punctured half-rate turbo codes are compared with 
half-rate LDPC codes in Fig. |4] Again, turbo codes with QC 
interleavers largely outperform regular LDPC codes (6% to 
11% overhead), and thus perform closer to irregular LDPC 
codes (5% overhead). However, puncturing even more the 
turbo code to raise it to R c = 2/3 widens the gap with ir- 
regular LDPC codes, placing the performance curve with QC 
interleaving (5.2% overhead) at equal distance from regular 
LDPC codes (7.5%) and irregular LDPC codes (3%). 



5. CONCLUSION 

In this paper we proposed a novel decoding algorithm for 
turbo codes over the BEC. This algorithm, characterized by 
"on-the-fly" propagation in the trellises and hard information 
exchange between the two codes, is appropriate for UL-FEC. 
Performance results with very small overhead were shown 
for different interleaver sizes and coding rates. Although the 
turbo codes presented in this paper were not optimized for the 
BEC, the results are very promising. Further improvements 
can be done by optimizing turbo codes for this channel. 
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